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Abstract. The best way to reconcile political actors in a controversial 
electoral process is a full audit. When this is not possible, statistical 
tools may be useful for measuring the likelihood of the results. The 
Venezuelan recall referendum (2004) provides a suitable dataset for 
thinking about this important problem. The cost of errors in examining 
an allegation of electoral fraud can be enormous. They can range from 
legitimizing an unfair election to supporting an unfounded accusation, 
with serious political implications. For this reason, we must be very 
selective about data, hypotheses and test statistics that will be used. 
This article offers a critical review of recent statistical literature on the 
Venezuelan referendum. In addition, we propose a testing methodology, 
based exclusively on vote counting, that is potentially useful in election 
forensics. The referendum is reexamined, offering new and intriguing 
aspects to previous analyses. The main conclusion is that there were 
a significant number of irregularities in the vote counting that intro- 
duced a bias in favor of the winning option. A plausible scenario in 
which the irregularities could overturn the results is also discussed. 

Key words and phrases: Election forensics, Venezuelan presidential 
elections, Benford's Law, multivariate hypergeometric distribution. 



1. INTRODUCTION 

The statistical controversies surrounding the out- 
comes of the Venezuelan referendum, convened to 
revoke the mandate of President Chavez on August 
15th of 2004, generated a long spate of articles in 
newspapers and occupied significant television time. 
A Google search with the exact phrase "Venezuelan 
recall referendum" shows more than 100,000 hits 
in English. Several reports, commissioned by dif- 
ferent organizations, reached opposite conclusions. 
Roughly speaking, a fraud may have occurred dur- 
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ing the referendum or, on the contrary, was sta- 
tistically undetectable. A good example of this is 
the work of Hausmann and Rigobon (2011), where 
the authors claimed to have found statistical evi- 
dence of fraud. According to experts consulted by 
The Wall Street Journal, "the Hausmann/Rigobon 
study is more credible than many of the other al- 
legations being thrown around" (Luhnow and De 
Cordoba, 2004). However, their early claim (Haus- 
mann and Rigobon, 2004) was later rejected by The 
Carter Center [(2005), Appendix 4] and by Weisbrot 
et al. (2004). 

The first peer-reviewed article devoted to the sta- 
tistical analysis of the referendum data (Febres and 
Marquez, 2006) concluded that there is statistical 
evidence for rejecting the official results. This ar- 
ticle, in International Statistical Review, made no 
mention of the paper by Taylor (2005) which con- 
cluded explicitly that there is no evidence of fraud. 
Taylor's paper is the best known reference on the 
subject, widely covered by media; in part because 
he was asked to investigate the allegations of fraud 
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on behalf of The Carter Center. Another well-known throughout this paper and presents a critical revi- 

reference is a paper by Felten et al. (2004), which did sion of the five papers cited above. In Section 3 we 

not detect any statistical inconsistency that would propose a methodology, based exclusively on vote 

indicate obvious fraud in the election. However, three counting, to test the recall referendum of 2004. The 

papers in this issue of Statistical Science (Delfino presidential elections of 1998 and 2000 are also re- 

and Salas, 2011; Prado and Sanso, 2011; Pericchi viewed. Far from being a statistical headache, the 

and Torres, 2011) support the claim of fraud. Who referendum is an excellent dataset to exercise a wide 

is right? variety of elementary but powerful statistical tools. 

The statistical papers on the referendum can be Additionally, the case of study is also useful for illus- 
grouped into two classes: those that only use vote trating some common mistakes in stochastic mod- 
counting and those that use related additional data, eling. Section 4 summarizes the main findings and 
Five papers mentioned above cover the different conclusions, 
claims of fraud investigated by the panel of experts 

convened by The Carter Center [(2005), Appendix 13]. 2. REFERENDUM FRAMEWORK 

These are: AND CRITICAL REVIEW 



(1) Discrepancy between official results and exit 
polls (Prado and Sanso, 2011) and unexpected cor- 
relations between computerized vote counting, the 
number of signatures for the recall petition and au- 
dit results (Delfino and Salas, 2011). 

(2) Anomalous distributions of votes among vot- 
ing notebooks (Febres and Marquez 2006; Taylor 
2005), including high rates of ties (Taylor, 2005) and 
failure of fit to Benford's Law for significant digits 
(Pericchi and Torres 2011; Taylor 2005). 

I am very skeptical about the use of data from other 
sources. To make a long story short, below I men- 
tion only key facts that can be extracted from the 
Comprehensive Report of The Carter Center: 

The months previous to the referendum were highly 
polarized, with mass rallies for and against the gov- 
ernment, with aggressive campaigns for attracting 
new voters and to intimidate and persecute both sign- 
ers (people who signed for the recall petition) and 
supporters of President Chavez. Even the referen- 
dum day was hot. The electoral actors took ad hoc 
decisions that generated suspicions and lack of con- 
fidence in the whole process. 

In this political atmosphere, we must assume that 
any unofficial information will be controversial. If 
there are many doubts about the official results, one 
cannot expect consensus with other data. Further- 
more, one must be very careful with the statistical 
assumptions that one will use. 

This article has two purposes: (1) to bring order 
to the ruckus caused by different statistical analy- 
ses, some of them carried out by non-experts, and 
(2) to examine, by a proper forensics analysis, the 
allegations of fraud. Section 2 reviews the referen- 
dum framework, introduces the main notation used 



The electoral process is fully described in the re- 
port of The Carter Center (2005). The crucial fea- 
tures for the present analysis are: 

(i) A voting center consists of one or more elec- 
toral tables and each table consists of one, two or 
three voting notebooks, which are the official data 
units with the lowest number of votes. 

(ii) Within the time allowed, voters were regis- 
tered to a center. Voters usually chose a center close 
to their residence or workplace, many of them long 
before the referendum. When the time was over, the 
referee decided the number of voting notebooks in a 
center according to the number of voters. In addi- 
tion, notebooks are grouped in tables (no more than 
three per table), mainly for logistical reasons related 
to the voting process. 

(iii) In each center, voters are randomly assigned 
to the notebooks. 1 

(iv) There were only two options to vote: YES or 
NO. Although there was a very small percentage of 
invalid votes (0.3%), there was a significant percent- 
age of abstentions (30%). 



Every Venezuelan citizen is assigned an ID number. These 
numbers are assigned in sequential order by date of request. 
Usually, it is done when a Venezuelan girl or boy is ten years 
old. By this I mean that the number is independent of the en- 
tire electoral process. The ID number of the voters (older than 
18 years old) has up to nine digits and, except for a case of ex- 
treme longevity, at least six digits. The mechanism to assign 
voters to notebooks can be described as follows: According to 
the last two digits, the voters were uniformly distributed to 
the notebooks. For example, in a center with four notebooks, 
if the last two digits ended between 00 and 24, then it was 
assigned to notebook 1. If the last two digits ended between 
25 and 49, then it was assigned to notebook 2, and so on. 
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(v) The voting notebooks were computerized 
(touch-screen voting machines which collected 86% 
of the valid votes) and manual (ballot boxes which 
represented 14% of the votes). 

Both (i)-(ii) and (iv)-(v) are simple true facts 
but (iii) is a statistical hypothesis. The secrecy of 
the ballot lies in the random assignment of voters 
to notebooks. For this reason, (iii) is essential for 
a fair election. Thus, we assume it is true through- 
out our analysis, with the exception of Sections 3.5- 
3.7, where we suppose there were irregularities in 
the allocation of voters to notebooks. 

Next, let us introduce the basic notation used 
throughout this paper. To do so, I will use the term 
polling unit generically in the next three paragraphs 
to refer to a center or a table or a notebook. 

- Let Yi be the number of YES votes (those favoring 
recalling President Chavez) and iVj the number of 
NO votes in polling unit i. 

- Let Ti = Yi + JVj be the total number of valid 
votes in polling unit % and Tj the number of vot- 
ers assigned to that polling unit (the size of the 
polling unit). Note the difference between voters 
and valid votes. 

- Let Oi = T{ — Ti be the number of invalid votes 
and abstentions in the polling unit i. For brevity, 
we refer to them as the OUT votes (out of the 
electoral consultation) . 

In the rest of this section, where we review differ- 
ent papers, the subscript can refer to centers, tables 
or notebooks. However, in Section 3 the subscripts 
are used only to identify voting notebooks. 

2.1 Discrepancies Between Two Exit Polls and 
Official Results 

Prado and Sanso (2011) addressed the controver- 
sial discrepancy between two independent exit polls 
and the official results. Roughly, the official result 
was 41% YES votes and 59% NO votes, while the 
exit poll results were 61% YES votes. The polls 
were collected by a political party (Primero Justi- 
cia) and a non-governmental organization (Sumate), 
both opposition to president Chavez. The authors' 
main claims are: 

CI: There was no selection bias in choosing the cen- 
ters to be polled. 

C2: The discrepancies per center cannot be explai- 
ned by sampling errors. 



CI is settled by noting that the proportion of YES 
votes for the overall population matches the propor- 
tion of YES votes for the polled centers. 

Claim C2 is addressed by assuming that the sam- 
pling distribution of the number of YES answers for 
a given polled center i, say yi, is a Binomial (U,pi) 
random variable. The parameters of this Binomial 
are: t$ the size of the sample collected at the cen- 
ter and pi the proportion of YES votes, namely 
Pi = Yi/Ti. Under this assumption, Prado and Sanso 
(2011) showed that there are significant differences 
between the official results and the exit polls in about 
60% of the 497 polled centers. The authors also con- 
sidered the pairwise comparison between the two 
exit polls among the common polled centers (27 in 
total). We remark that eight of them (30%) differ 
significantly. 

It appears that Prado and Sanso had the following 
assumptions in mind to determine that yi is Bino- 
mial with the parameters above: 

Al: Given a polling center, the persons to interview 
were selected by simple random sampling. 

A2: Each interviewed person responded to the ques- 
tion with the truth. 

A careful reading of Section 2 of Prado and Sanso 
(2011) suggests that the sample at each center may 
correspond to a more complex model than simple 
random sampling. How could the used model af- 
fect the estimates and conclusions of their analy- 
sis? If, for example, and as seems to be, stratified 
sampling was used, it will depend on the stratifi- 
cation schema and the allocation criteria used by 
the pollsters (Lohr, 2004). In the absence of con- 
crete information, the assumption of the binomial 
distribution is the most reasonable one. However, 
we cannot ignore the uncertainty about the model 
and, consequently, about the sampling errors com- 
puted under Al. 

The authors discussed briefly the consequences of 
the non-veracity of A2. "It has been demonstrated 
repeatedly that non-response can have large effects 
on the results of a survey" (Lohr, 2004). It is quite 
possible that, in a highly polarized political climate, 
voters that supported Chavez were associated with 
non-response, since they could identify the pollsters 
as members of the opposition to Chavez. Unfortu- 
nately Prado and Sanso had no estimates of non- 
responses and so had to ignore their effects. 

Other sources of voter selection bias and measure- 
ment error are discussed in this paper. Some of them 
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could imply a systematic bias across the pollsters. 
Such is the case of the late closing of the voting 
centers: 

The voting centers had to be open until 4-'00 p.m. 
but the electoral umpire extended the closing time 
twice, first until 9:00 p.m. and finally until midnight. 
This was not foreseen by the pollsters and during 
the afternoon and evening, there was a fierce cam- 
paign to promote the attendance of the supporters of 
President Chavez to the voting centers (The Carter 
Center, 2005). 

Prado and Sanso (2011) also studied this possi- 
bility, but the available data are very limited. Al- 
though the statistical procedure and motive are cor- 
rect, missing data can produce results that have no 
validity at all (De Veaux and Hand, 2005). 

It is hard to believe that the discrepancies between 
exit polls and official results are due to sampling 
and random non-sampling errors. Unfortunately the 
information about exit polls is limited and does not 
allow a more rigorous analysis. 

2.2 YES Votes Versus Number of Signers 
in the Recall Petition 

Delfino and Salas (2011) focused on the associa- 
tion between the YES votes and the number of sign- 
ers in the recall petition. 2 In the first four sections of 
this paper the authors described the electoral pro- 
cess well. However, from the fifth section onward, 
I have major concerns. 

Let Si be the number of signers in voting center i. 
The authors considered the following two relative 
numbers of YES votes and signers: 

(1) h = -J- and Si = — ^. 

They conducted a bivariate data analysis with k as 
a response variable and s as an input variable. Since 
k < 1/s, they said: "In voting centers with a large 
value of s, we expect a value of k around 1. . . The 
situation is completely different in voting centers 
with a small value of s. The singularity can pro- 
duce very high values of k in the neighborhood of 
s = 0. Hence the level of uncertainty in k becomes 
very large." Later on, they added: "The comput- 
erized centers are very far away from 1/s, clearly 



2 For readers who do not know the intricacies of the refer- 
endum, the signatures were collected eight months before the 
referendum. Many signers were invalidated and some had to 
sign again in a second runoff (The Carter Center, 2005). 



contradicting the expected non-linear behavior with 
respect to s." Finally, they claimed fraud because 
the data contradict this behavior and even ventured 
to establish a hypothesis: "In computerized centers, 
official results were forced to follow a linear relation- 
ship with respect to the number of signatures." 

What can justify the previous conjecture? All that 
we really know is that the range of k is larger when 
s decreases. How can we infer the expected nonlinear 
behavior of k with respect to s from this fact? As is 
shown in equation (4) of their paper, 

h — — 
•n — j 

Si 

Pi — Yi/Ti being the proportion of YES votes in cen- 
ter i. Then, k decreases as 1/s, of course, but in- 
creases as p does and there is a strong relation be- 
tween these two variables. In fact, as we will explain 
next, one expects the value of k to be constant with 
respect to s, not only showing that the conjecture of 
Delfino and Salas (2011) is false, but showing that 
the results observed are as expected. 

Following their schema, we analyze the (full) com- 
puterized centers and (full) manual centers sepa- 
rately. Manual centers are peculiar. They usually 
correspond to remote locations and they have a much 
smaller number of votes than the computerized ones 
(Prado and Sanso, 2011). For this reason many au- 
thors perform a separate analysis of these data. There 
was also a small number of mixed centers where 
there were both manual and computerized notebooks. 
These centers represent only 1.26% of the total YES 
votes, 1.3% of the valid votes, and are excluded in 
what follows. 

Let 7i = 389,862 and 72 = 3,548,811, the total 
YES votes in manual and computerized centers, re- 
spectively. Consider also the total number of signers 
in manual and computerized centers, that we shall 
denote by #171 and #272, so that 6\ and #2 are ratios 
between total signers and total YES votes. As men- 
tioned before, I am skeptical about the use of data 
that are not official results of the referendum. So, 
we will assume 0\ and O2 are unknown parameters 
and will only assign values to them for simulation 
purposes. 

As The Carter Center (2005) remarked, the sign- 
ers were the hard core of the YES votes. In fact, 
Delfino and Salas (2011) claimed that "each sig- 
nature has a high probability of resulting a YES 
vote." Let us simplify the scenario and assume that 
each signature in a center was a YES vote in that 
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Fig. 1. YES votes versus simulated signatures according to the heteroscedastic linear model (4)- The left panel corresponds 
to manual centers with 8\ — 1/1.81. The right panel corresponds to computerized centers with #2 = 1/1.15. 



center. Thus, the ratios 6± and #2 are less than 1. 
Under this assumption, the conditional distribution 
of Si given Yi can be fitted by a hyper geometric dis- 
tribution with parameters j c (the number of mar- 
bles in the hyper geometric jargon), 6 C ^ C (the num- 
ber of white marbles) and Yi (the number of draws), 
c being equal to 1 or 2 according to whether i rep- 
resents a manual or computerized center. The ex- 
pected value and variance of the hyper geometric vari- 
able are 

E[S i \Y i ]=e c Y i and 

(2) - Y 
Vax[Si\Yd = Yfi c (\ - C )- f 

Using the standard normal approximation one ob- 
tains 

( 3) Si-e c Yi ^ M{01) 

J\f([i,a 2 ) a Normal random variable with mean \i 
and variance a 2 . Relation (3) leads us to consider 
the two heteroscedastic linear models 

(4) S = e c Y + Af(0,8 c (l-9 c )Y), 

for manual centers (c = 1) and computerized centers 
(c = 2). 

For each center, we simulated the number of sig- 
natures given the number of YES votes at the cen- 
ter using (4) . Typical outcomes of these simulations 
with 9\ = 1/1.81 and #2 = 1/1.15 are shown in Fig- 
ure 1. The values of 9 C were chosen with the in- 
tention of comparing our simulated clouds of points 
with those shown in Figure 6 of Delfino and Salas 
(2011). Note that the least squares regression lines 
of the latter ones have slopes 1.81 and 1.15 using 
a reverse relation between the variables, namely Y = 



a c S + b c + error. Thus, we take 6 C = \/a c . It is dif- 
ficult to see how to reject the regression model (4) 
using statistical testing, even under the classical ho- 
moscedastic linear model. The differences between 
the clouds associated with manual and computerized 
centers are due to differences in scale and variances, 
included in the heteroscedastic linear model (4). 
There is nothing mysterious about this difference, 
as Delfino and Salas (2011) suggested. Reversing the 
relationship between Y and S in regression model (4) 
yields a heteroscedastic linear model 

(5) Y = f3 c S+M(0,a 2 S). 
Dividing by 5, the above equation becomes 

(6) fc = /3c+ * Af(0,a c 2 ), 

which precisely describes the clouds of points shown 
in Figures 3 and 5 of Delfino and Salas (2011), with 
observations around a constant for any value of s, 
although the range of k is larger when s is smaller. 
In summary, it is expected that {hi} will be con- 
stant with a dispersion which decreases as 1/y/S 
(note the difference between S = sT and s). Note 
that, although s will be small, if T is large (like al- 
most every computerized center), the variance can 
be small, explaining why computerized centers are 
more concentrated around the expected value of k. 

There is an additional comment related to Fig- 
ures 3, 4 and 5 of Delfino and Salas (2011) worth 
making. Note that all right panels have a gap for 
small values of the input variables (almost without 
observations). Compare the figures removing these 
gaps in both panels. For example, remove the win- 
dows with s < 0.1 in Figures 3 and 5 and the win- 
dows with less than 200 total votes in Figure 4. The 
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Fig. 2. Left panel: Histogram of the YES votes (top), NO votes (middle) and OUT votes (bottom) per notebook. Right panel: 
Benford's Law for the second digit (solid line) versus relative frequencies of the second digit for YES votes — +, NO votes 
= x and OUT votes = o. 



behavior is very similar for manual and computer- 
ized voting centers. Their conclusion about the dif- 
ferent behavior between manual and computerized 
centers seems inaccurate. 

There are more intriguing statistical arguments 
in the paper of Delfino and Salas (2011). Although 
we have focused only on their main claim, I should 
add a comment related to the data. From the least 
squares regression lines shown in Figure 6 of Delfino 
and Salas (2011) one can estimate the total signa- 
tures in fully manual or computerized centers (ex- 
cluding the mixed ones) on which the authors base 
their study. This total is 3,310,200, close to the 
3,467,051 signatures submitted to the electoral um- 
pire (Delfino and Salas, 2011). However, the total 
number of valid signers was 2,553,051 (The Carter 
Center, 2005). I leave the conclusion to the reader. 

2.3 Anomaly Detection by Benford's Law 

Pericchi and Torres (2011) compared empirical 
distributions with Benford's Law governing the fre- 
quency of the significant digits (Hill, 1995). Con- 
sidering several electoral processes in three coun- 
tries, the only case compellingly rejected by their 
test is the NO votes at computerized notebooks in 
the Venezuelan recall referendum. In addition, they 
made reference to recent contributions in which com- 
pliance or violation of the law in electoral processes 
has been studied. Some criticisms related to the use 
of the law in electoral data (The Carter Center, 
2005; Taylor, 2005) were also discussed. As theoreti- 
cal contributions, the authors obtained a generaliza- 
tion of the law under restrictions of the maximum 



number of votes per polling station and discussed 
technical issues related to measuring the fit of the 
law. 

It is important to note that Pericchi and Torres 
(2011) did not analyze the OUT votes or absten- 
tions. Figure 2 shows the marginal distributions of 
each option of vote per notebook (left panel) and 
compares the empirical distributions of the second 
digit with Benford's Law (right panel). Regarding 
Figure 2: 

• As Pericchi and Torres showed, the YES votes 
conform to the law, while the NO votes do not. 
However, the strongest widespread departure from 
the law is related to the OUT votes. The x 2 test 
statistic for this option is the highest of the three. 

• It is known that compliance with the law is more 
likely when the skewness is positive (Wallace, 2002), 
and the only distribution with positive skewness 
is related to the YES votes. 

We should remark that violations of Benford's Law 
may be due to unbiased errors (Etteridge and Sri- 
vastava, 1999). Thus, deviations from the law can 
arise regardless of whether an election is fair or not 
(Deckert et al., 2010). On the other hand, there are 
many types of fraud that cannot be detected by Ben- 
ford's analysis (Durtschi et al., 2004). So, electoral 
results that conform to the law are not neccessarly 
free of suspicion. 

To illustrate the comments above let us consider 
results by centers rather than by notebooks. In Fig- 
ure 3 (left panel) we show the distributions of the 
number of votes at this aggregation level. Note that 
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Fig. 3. Left panel: Histogram of the YES votes (top), NO votes (middle) and OUT votes (bottom) aggregated by center. 
Right panel: Benford's Law for the second digit (solid line) versus relative frequencies of the second digit for YES votes = + . 
NO votes = X and OUT votes = o. 



now all distributions have positive skewness. In the 
same figure (right panel) we also show Benford's 
Law for the second digit and the related empiri- 
cal distributions of vote per center. All voting op- 
tions confirm the law. According to this analysis, 
there is no reason to doubt the official results by 
center, despite that the test suggests the contrary 
when we use the results by notebook. Is the for- 
mer a false negative or the latter a false positive? 
Could unbiased errors in the vote counting by note- 
books reproduce such a scenario? Or, conversely, 
could the results by centers be masking a fraud in 
notebooks? Benford's test does not address this con- 
troversy. 

2.4 Irregularity in the YES Votes Distribution 

Febres and Marquez (2006) tested the distribu- 
tion of YES votes in the voting notebooks. In a first 
round, they applied a Z test to compare the pro- 
portion of YES votes in each notebook with the 
proportion from the center to which the notebook 
belongs. The number of irregular notebooks (note- 
books with a proportion significantly different from 
the proportion of the center) resulting from this 
round is expected. Therefore, this analysis suggests 
no inconsistency. According to the territorial organi- 
zation of Venezuela, the voting centers are grouped 
into parishes. The authors subdivided the parishes 
into clusters of centers, using a criterion that we 
discuss later. They then applied Pearson's x 2 test 
to compare the distribution of YES votes among 
the notebooks at each cluster with the conditional 
expected distribution given the overall results by 



cluster and valid votes by notebook. In this second 
round, they reported a high percentage of irregular 
clusters (clusters with an outlier x 2 statistic). Their 
main finding was that the irregular clusters favor the 
NO option. Moreover, they showed a monotone re- 
lationship between the proportion of YES votes by 
cluster and the p- value of the Pearson x 2 test. Tun- 
ing the confidence level to block irregular clusters, 
they estimated the overall result and the winning 
option is YES. 

As mentioned earlier, voters within the same cen- 
ter were randomly assigned to the notebooks. Thus, 
each notebook is a random sample without replace- 
ment from the voting center population. The frame- 
work can be completely different when notebooks 
are grouped by clusters of centers. If the propor- 
tions of YES votes of two centers in the same cluster 
are not equal, no matter how similar they are, and 
if the total number of votes by notebooks is large 
enough, any consistent test will detect significant 
discrepancy between the proportions in the note- 
books and the proportion in the cluster. The au- 
thors took care of this fact. They made a trade-off 
between the homogeneity of the cluster (how similar 
the proportions of the centers within the cluster are) 
and the number of votes per notebook. Basically, the 
clusters were chosen such that the Z test does not 
detect a significant difference between the propor- 
tion of YES votes at the notebook with the greatest 
number of votes and the cluster proportion. In this 
way they ensured that each notebook is a represen- 
tative sample of the cluster. The authors referred to 
this as the minimum heterogeneity distance for clus- 
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Fig. 4. Left panel: Exact probability density function (dashed line) of the \ 2 test statistic related to the cluster with five 
notebooks described in Table 9 of Febres and Marquez (2006). Probability density function of the reference distribution used by 
the authors (solid line). The cross marks the observed value for the test statistic. Right panel: Exact cumulative distribution 
function (dashed line) and usual asymptotic approximation for the x 2 test statistic (solid line). 



tering analysis and made reference to the books of 
Sokal and Sneath (1973) and Press (1982). 

I have two concerns about these results. The first 
deals with a general concern about the validity of ad 
hoc mechanisms to identify false positives (detect- 
ing fraud when none is present), which might be the 
case. The second is a technical issue that must be 
resolved before subscribing to the authors' conclu- 
sions. 

In the referendum context, the standard cluster 
units of notebooks are the voting centers. I guess 
that the authors did not report results at this level 
of aggregation because they did not observe incon- 
sistencies at this level of aggregation. In fact, if we 
apply Pearson's x 2 test to detect irregular centers, 
in the same way that the authors applied this test 
to detect irregular clusters, we do not observe major 
inconsistencies. Therefore, their results depend on 
a particular way of clustering the notebooks. Why 
these clusters instead of ones more or less homoge- 
neous? Why keep the hierarchical ordering by par- 
ishes instead of another more related to political 
preferences? With these questions I am only trying 
to illustrate natural doubts that can arise when we 
introduce ad hoc criteria for grouping notebooks. If 
the results were independent of the grouping level, 
then this would not matter, but this is not the case. 

My second concern is the use of the usual asymp- 
totic distribution of Pearson's x 2 statistic to deter- 
mine when an observed value of the test statistic 
is an outlier. This asymptotic does not hold in the 
framework that we are considering. In general, it 
is doubtful that this holds when the multinomial 
distribution, which is the standard underlying as- 



sumption when this test is performed, is replaced 
by a multivariate hypergeometric distribution (Zel- 
terman, 2006), which is the reference model for the 
distribution of votes among notebooks. In particu- 
lar, because all the votes of each cluster are dis- 
tributed among the notebooks, the correlations are 
not negligible. Despite this, I do not deny that there 
is a high percentage of irregular clusters. To illus- 
trate the previous comment, we consider the cluster 
with five notebooks described in Table 9 of Febres 
and Marquez (2006). Following the standard asymp- 
totics, the authors used the x 2 distribution with four 
degrees of freedom to compute the p- value of the test 
statistic related with this cluster. We compute the 
exact distribution of this statistic to compare with 
the x 2 (4) distribution. How to compute the exact 
distribution is not relevant for now (it is a simple 
exercise following the discussion in Section 3). The 
important thing here is that an outlier for the x 2 (4) 
distribution is also an outlier for the exact distri- 
bution (see the left panel of Figure 4). In fact, as 
the right panel of Figure 4 shows, the test statis- 
tic for this cluster is less than x 2 (4) in the usual 
stochastic order. If we had a similar result for all 
clusters, then we could ensure that the percentage 
of irregular clusters is equal to or greater than the 
percentage reported in the paper. I believe that such 
a result could be obtained. An alternative would be 
to compute the exact distribution for each cluster to 
recompute the p-values and the percentage of out- 
liers. This involves high computational costs but it 
would also allow us to test the authors' main claim 
about a monotone causal relationship between the 
proportion of YES votes and the p-value. 



FORENSICS OF THE VENEZUELAN REFERENDUM 



9 



The conjectures of Febres and Marquez are inter- 
esting and point in a concrete direction, but require 
a further analysis before raising them to conclusions 
of fraud. 

2.5 Too Many Ties? 

Taylor (2005) considered the following six models 
of "fair elections": 

Tl. A model in which the YES/NO votes in com- 
puterized notebooks are independent and 
identically distributed Poisson random vari- 
ables, with common expectation according to 
the results in the country. 

T2. The same model as above but with a common 
distribution which is not necessarily Poisson. 

T3: A model in which the YES/NO votes in the 
notebooks of each electoral table are indepen- 
dent and identically distributed Poisson ran- 
dom variables, with common expectation ac- 
cording to the results in the table. 
T3.1. A model in which the distribution of YES /NO 
is multinomial, splitting up the YES/NO votes 
of each electoral table equally among the note- 
books. 

T4. A multivariate hypergeometric model, condi- 
tioned on the results per electoral table and 
valid votes per notebook. 

T5. A parametric bootstrap where total votes of 
notebooks {Tj} are generated according to the 
integer part of a multivariate Normal distribu- 
tion. Then YES votes in notebook i are samp- 
led according to a Binomial(Tj,p), p being the 
proportion of YES vote in the electoral table. 

Although in Taylor's paper it is not always explicitly 
said, T3-T5 are conditioned on the official results by 
electoral table and T4 is additionally conditioned on 
the official number of valid votes by notebook. 

From these models, the author analyzed different 
statistical anomalies related with claims of fraud. 
Two of them have been previously discussed in this 
section (Febres and Marquez 2006; Pericchi and Tor- 
res 2011). The third is related to high rates of YES 
ties: A YES tie is a perfect match of YES votes be- 
tween two notebooks. Accordingly, his analysis can 
be divided into three parts: 

• Global test for goodness of fit for models T3 
and T3.1. 

• Comparative study between the distribution of 
the significant digits according to T3.1 (also to 
a slight improvement of Tl), the observed distri- 
bution and Benford's Law. 



• Computation of the expected number of electoral 
tables with one or more YES ties, for each model; 
and comparison with the observed number of ties. 

His main results and conclusions can be summarized 
as follows: 

Rl. "The more powerful x 2 test" strongly rejects 
the Poisson model T3. However, a False Discov- 
ery Rates analysis (Benjamini and Hochberg, 
1995) shows "there is not evidence of widespread 
departures for the Poisson model." This result 
"shows no systematic fraud in the form of vote- 
capping." 

R2. The distribution of the significant digits of the 
multinomial model T3.1 does not conform to 
Benford's Law and is virtually identical to the 
observed distribution. Thus, Benford's Law is of 
"little use in fraud detection in this instance." 

R3. The Z scores used to compare the observed 
number of electoral tables with one or more 
YES ties with the expected number according 
to his models "are fairly high" (I will make an 
exception with the Z test related to T4, which 
is equal to 2.37). "This only means that we 
can reject the global null hypothesis" (i.e., the 
global models) "and not that there indeed was 
fraud." 

First of all, the validity of a statistical model is 
not entirely justified by the fact that it fits the data, 
especially if one wants to test the quality of those 
data. The costs, here associated with a false nega- 
tive (failing to identify a fraud condition when one 
exists), are too high. The model should at least not 
be at odds with our knowledge about the system 
that is being modeled. 

According to Taylor's web page, 3 "the first two 
models" (Tl and T2) "are clearly unrealistic." The 
next two (T3 and T3.1) also are: 

(a) As discussed in the previous subsection, the as- 
sumption of independence among the notebooks 
is meaningless. There are links on the sums of 
votes across an electoral table. All the votes of 
each table are distributed among its notebooks, 
so the correlations are not negligible. 

(b) The number of voters (note again the difference 
between voters and votes) by notebooks varies 
among the notebooks of the same electoral ta- 
ble, so it makes no sense to equally split the 
votes of a table among the notebooks. 



3 http: / /www-stat .stanford.edu / "jtaylo / Venezuela . 
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The last two models (T4 and T5) take into ac- 
count (a) and (b). In particular, I agree that the 
multivariate hyper geometric approach used in T4 is 
the right way to generate vote configurations. How- 
ever, T5 resorts to assumptions that can be ques- 
tionable, as to the use of the integer parts of mul- 
tivariate Normal random variables to generate valid 
votes by notebooks. Given that the two models pro- 
vide similar results according to his own analysis, we 
will apply the principle of Occam's razor 4 to reduce 
his list to just one realistic model. 

What can we conclude when a questionable dataset 
does not show evidence of widespread departures for 
an unrealistic model? What if the distributions of 
the significant digits are similar between them but 
differ from Benford's Law? The conclusions in Rl 
and R2 are baseless. We cannot conclude anything 
useful from these analyses. 

Let us move to R3, where he considers the multi- 
variate hyper geometric model and simulations car- 
ried out by Felten et al. (2004) to analyze the YES 
ties phenomenon. These simulations show that the 
number of electoral tables with one or more ties is 
high, but not high enough to be considered a sign 
of fraud (around 1% of cases can have an equal or 
greater number of tables with YES ties, according 
to this model) . This part of his analysis did not de- 
tect extreme statistical anomalies that would indi- 
cate obvious fraud in the referendum. Of course, as 
Felten et al. emphasized, this does not imply the 
absence of fraud, either. 

3. REEXAMINING THE REFERENDUM 

The purpose of this section is to reevaluate the 
claim of fraud. An electoral fraud occurs if the re- 
sults are altered to favor one of the options. Having 
evidence that the changes are enough to overturn 
the winner, the outcomes of the referendum should 
not be recognized. Moreover, if the handling does 
not change the winner, but changes the proportions 
significantly, it must be considered a fraud. Elec- 
toral results can affect drastically future electoral 
processes. In particular, this could have happened 
during the Venezuelan parliamentary elections, one 
year after the referendum, in which the political par- 
ties that supported the YES option withdrew, claim- 
ing the possibility of new fraud. Also, a tight result 



4 Entia non sunt multiplicanda praeter necessitatem (enti- 
ties must not be multiplied beyond necessity). 



can have a different political meaning than an out- 
come with a winner by a wide margin, especially in 
a recall referendum. At the end of this section we 
evaluate the hypothesis of irregularities in the vote 
counting to favor significantly the NO option. We 
begin by describing the joint probability distribution 
of results per notebook, conditioned on the complete 
set of information of each center. This corresponds 
to a multivariate hyper geometric model, similar to 
that used in Felten et al. (2004) and Taylor's T5 
model (the differences are explained below). This is 
a key tool in the hypothesis test methodology that 
we develop through this section. 

3.1 Shuffling Voting Cards 

Consider a center with m notebooks, labeled by 
1,2,. . . ,m. Let v = r « be the total voters in the 
center. Identify each voter by a number in {1, 2, . . . , v } 
such that the first t\ voters are in notebook 1, the 
following T2 voters are in notebook 2, and so on. 
In the vote counting, each voter is represented by 
a voting card according to her/his electoral option. 
It can be YES, NO or OUT. Let X { be the voting 
card of voter i. Then, the vote configuration at the 
center can be represented by 

v voters 

, A ^ 

(7) X = [X\, . . . ,X T1 , . . . , A„_ rm+ i, . . . , Xy) . 

V v ' * * ' 

notebook 1 notebook m 

Let y = Ya=i Y i be tne total YES votes in tne 
center. Similarly, let n = YlnLi be the total NO 
votes. Then, X is an outcome of shuffling the voting 
cards of the center: 



v voters 




y YES's n NO's v-y-n OUT's 



(8) 

That is to say that X is a permutation of C. 

According to the random mechanism used by the 
electoral umpire to assign voters to notebooks, given 
(y,n,v), any permutation of voting cards has the 
same probability of occurring. This is the underlying 
statistical principle shared by Febres and Marquez 
(2006), Felten et al. (2004) and Taylor (2005) for 
testing the referendum data. However, these authors 
do not consider all possible permutations: 

• The sampling distribution of the test used by Feb- 
res and Marquez (2006) in their first round, where 
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they conditioned on results by centers and valid 
votes by notebooks, corresponds to sampling on 
the set of outcomes of shuffling YES cards and 
NO cards in centers, leaving fixed OUT cards in 
notebooks. 

• The samples from the multivariate hypergeomet- 
ric model considered by Felten et al. (2004) and 
Taylor (2005) belong to a set of permutations even 
smaller than the previous one. They conditioned 
additionally on the total of YES votes and NO 
votes by electoral tables. That is, they just consid- 
ered shuffling YES cards and NO cards in tables, 
also leaving fixed OUT cards in notebooks. 

Both approaches fail to consider a large number of 
equiprobable results that match the referendum re- 
sults at the centers. In this paper, we compute sam- 
pling distributions of test statistics considering all 
possible permutations of the voting cards at each 
center. To simplify the writing, in what follows, we 
will refer to the result obtained by shuffling ran- 
domly the cards across all centers random sam- 
ple of the electoral process. 

3.2 Statistical Hypothesis of Fair Referenda 

If we assume that the referendum was properly 
conducted, the results by notebook correspond to 
a random sample of the electoral process. Therefore, 
the hypothesis of a properly conducted referendum 
is 

Hq: The votes per notebook correspond to a random 
sample of the electoral process. 

But, the rejection of %q does not imply that the 
results per notebook were altered to favor one of 
the options. It only implies that there is a signifi- 
cant presence of outliers in the distribution of votes 
per notebook. Innocent irregularities, as the incor- 
rect allocation of voters to notebooks, can generate 
such outliers. We consider the most innocent alter- 
native to T~Lq , assuming that: (1) there is a signifi- 
cant presence of outliers in the votes per notebook, 
(2) the outliers are the result of neutral irregular- 
ities, and (3) the irregularities affect a random set 
of notebooks, regardless of whether they belong to 
strongholds of the winning option or not. Therefore, 
we consider the hypothesis of an atypical fair refer- 
endum, namely, 

Hi: There is a significant presence of outliers in the 
votes per notebook that is consequence of inno- 
cent irregularities that affect a random set of 
notebooks. 



If there is in fact a significant presence of outliers, 
we can reject Hi because: (1) the irregularities are 
not innocent, introducing a significant bias in the 
vote counting, or (2) they affect mostly notebooks in 
bastions of one of the options. Therefore, we have to 
consider the bizarre, but fair, scenario in which the 
irregularities that generate the outliers are neutral, 
no matter what, and, for some reason, they affect 
mostly a set of notebooks that are in strongholds of 
one of the options. Thus, we consider the hypothesis 
of a bizarre but fair referendum: 

%2'- The significant presence of outliers in the votes 
per notebook is the result of innocent irregular- 
ities that affect mostly a set of notebooks from 
strongholds of one electoral option. 

The remaining alternative is a clear signal that the 
irregularities are not innocent. 

Before testing the hypotheses, we describe the data- 
set. 

3.3 Description of the Dataset 

It is required to have at least two notebooks per 
center for shuffling voting cards, so we restrict our 
analysis to these centers. In addition, since all alle- 
gations of fraud are related to computerized note- 
books, we only consider full computerized centers 
(centers where there are no manual votes). We also 
exclude a very small number of centers with empty 
notebooks (notebooks without valid votes). Empty 
notebooks could arise for technical problems, affect- 
ing the distribution of voters to notebooks in such 
centers. After this simple cleaning on full comput- 
erized centers with two or more notebooks, a con- 
sistent dataset is obtained with 4,162 centers, all of 
them with comparable notebooks. This means that 
the votes among the notebooks of a center are in 
the same order. These 4,162 centers represent 18,297 
notebooks, more than 83% of the total voters, and 
here will be the base of the study. The mean and 
the standard deviation of the number of voters per 
unit polls are 634 and 73, respectively. 

For the last two subsections of this section, we 
will also use the results of presidential elections of 
1998, in which Chavez was elected to his first term 
as President of Venezuela with 56% of valid votes 
against a coalition of roughly the same political par- 
ties that supported the recall. This election was car- 
ried out with an automated voting system, which 
featured a single integrated electronic network to 
transmit the results from the polling stations to cen- 
tral headquarters (McCoy, 1999). The legitimacy of 
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the electoral process and the acceptance of the re- 
sults by political parties and international observers 
is a guarantee of the reliability of the results. At 
that time, in each center, voters were also randomly 
assigned to polling units, according to the last num- 
ber of their ID, similarly to the process described in 
Section 2. As we do with the referendum's dataset, 
we exclude centers with only one unit poll and those 
with empty unit polls. After the cleaning, a dataset 
is obtained with 3952 centers, 15,667 unit polls, that 
represent 85% of the total voters. The mean and the 
standard deviation of the number of voters per unit 
poll are 594 and 112. For all the above, both sets of 
data are comparable for the statistical purposes of 
Section 3.6. 

A different scenario overshadowed the presidential 
elections of 2000, that we consider in Section 3.7, in 
which Chavez was elected to his second term with 
59% of valid votes. After two years of important po- 
litical changes, including the enacting of a new con- 
stitution, the criticism of Chavez's government in- 
creased, polarizing the political climate. Many claims 
of fraud, including machines not properly function- 
ing, people whose names did not appear on the elec- 
toral registry and pre- marked ballots, were made at 
that time. While The Carter Center does not believe 
that the election irregularities would have changed 
the presidential results, they consider those elections 
as flawed and not fully successful (Neuman and Mc- 
Coy, 2001). The election was carried out, roughly, 
with the same voting system used in 1998. However, 
there was an important difference in the number 
of voters per unit poll, increasing significantly the 
number of centers with only one unit. As with the 
referendum and the presidential elections of 1998, 
we exclude centers with only one unit poll and those 
with empty unit polls. In addition, we exclude unit 
polls with more votes than voters. Thus, we obtain 
a dataset with 1,600 centers, only 3,730 unit polls, 
that represents 53% of the total voters. 

The three datasets provide estimates of high pre- 
cision for the resultant percentage of votes per elec- 
toral option. Table 1 summarizes their main statis- 
tics. 

3.4 Testing the Hypothesis of a Properly 
Conducted Referendum 

The way to test irregularity is to determine whether 
an observed value is an outlier or not. Let i be 
a focal notebook and c the center to which it be- 
longs: 



Table 1 

Statistical summary of the dataset 



Year 


Unit polls 


% of total 
voters 


Mean of voters 
per unit poll 


Standard 
deviation 


1998 


15,667 


85% 


594.70 


111.97 


2000 


3,730 


53% 


1662.50 


361.74 


2004 


18,297 


83% 


634.60 


73.86 



- Since voters of a center are randomly assigned to 
notebooks, Y$ is the total of YES cards in a simple 
random sample (without replacement) of size Tj 
from the voting cards of the center. In particular, 
E[li|"Ho] = Pen, with p c = y c /v c and 

Var[Fj|% ] =Tip c (l -p c ) ° T * ■ 

v c l 

- The minimum Tj in the 18,297 notebooks involved 
is 347 (the mean value f = ^Tj/18,297 is equal 
to 634.60, and the maximum is 975). 

Coupling these facts, a straightforward application 
of the Central Limit Theorem implies that, under Ho, 
the score 



y/pcO- -Pc)Ti(is c ~ Ti)/(v c - 1) 

is approximately A/"(0, 1), for any i. Therefore, a test 
of regularity for a single notebook is reduced to de- 
termining the significance of Z. 

To get an overall qualitative idea of the joint be- 
havior of the Z scores under T~Lo , the normal proba- 
bility plot of these statistics from a random sample 
of the electoral process is shown in Figure 5 (left 
panel). In the same figure (right panel) the normal 
probability plot of the scores based on the observed 
values is also shown. This plot highlights many of- 
ficial results far from what is expected. Let us peer 
into the most atypical cases. Table 2 shows the offi- 
cial results of centers 7990 and 1123, 5 where are the 
notebooks associated with minimum (—9.08) and 
maximum (10.54) Z value. Let us call these note- 
books m and M, respectively. Under Hq, the ex- 
pected value of Y m is 161.81, almost twice the ob- 
served value, which is 81, while the expected value 
of Ym is 139.08, just over half of what is observed, 
which is 233. 



We use the center encoding used for the referendum. 
Codes, as well as the list of centers, varies from election to 
election. 
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Fig. 5. Normal probability plot of Z scores based 
Table 2 

Results in centers with notebooks associated with minimum 
(*) and maximum (+) Z value 



Center 7990 Center 1123 



Notebook 




m 








M 




Y 


174 


81* 


235 


191 


60 


233+ 


62 


N 


272 


70 


375 


396 


137 


359 


143 


T 


607 


600 


610 


588 


583 


594 


567 



An overall comparison is handled by summing 
squares of Z scores. Let 

18,297 

(io) s 2 =£z 2 . 

1=1 

A straightforward computation gives E[S' 2 |^o] = 
18,297. The variance can be estimated by Monte 
Carlo, shuffling the voting cards. We performed 1,000 
random samples of the electoral process and ob- 
tained a standard deviation of 216. Next we show 
that the sampling distribution of the test statistic 

(-qn rpYES = S 2 ~ 18,297 

1 ' 216 
can be approximated by a standard Normal distri- 
bution. 

The centers have between 2 and 18 notebooks. 
The distribution of the centers according to the num- 
ber of notebooks is shown in Table 3. 

The sum of squares can be decomposed as follows: 

18,297 

C2 _ \ 7 2 _ 2 , 2 _J_ I 2 

15 — Z^i i ~ ^nb(l) + X n b(2) i h X n b(18)> 
i=l 

^nb(i) De i n £ the sum along all the centers of the 
squares of the Z scores related to the ith notebook 
of a center. Although the results of notebooks be- 




-10 -5 5 10 



a random sample (left) and observed values (right). 

longing to the same center are correlated, given Hq 
they are independent of results in other centers. In 
turn, each X n b(i) * s sum °^ independent random 
variables and each one is approximately the square 
of a standard normal random variable. Then, X n b(i) 
is approximately x 2 with ^2 m>i C m degrees of free- 
dom, C m being the number of centers with m note- 
books. Table 4 lists the degrees of freedom related 
to {X 2 b(i)) l<z<18}. 

In general, approximating the distribution of sums 
of correlated x 2 can be difficult. Fortunately, this is 
not case here. Two remarks: 

- For i < 10, the degrees of freedom are large enough 
to fit the distribution of Xnb(i) ky a Normal. 

" Xnb(i) H h Xnb(io) represents 99% of the Z 2 

statistics in S 2 . 

Therefore, S 2 is approximately a sum of Normal ran- 
dom variables. Letting q 2 be the sample variance ob- 
tained from k independent samples of S 2 , under Hq, 
the test statistic 

(12) tYES _ 5 2 -E[5 2 |^ ] 

is approximately A/"(0, 1), for any large k. As we said 
above, we simulated 1,000 random samples of the 
electoral process, obtaining ( = 216. We also used 
the samples to confirm that, under Ho, the distri- 
bution of T YES is approximately A/"(0, 1). For that, 
we test normality with different methods, all of them 
with the same conclusive results. To illustrate, Fig- 
ure 6 compares the kernel density estimator of the 
probability density function of T YES with the prob- 
ability density of a standard Normal. 

The T YES observed value, according to the official 
results, is T^ s w 13.12, which establishes that the 
results of YES votes per notebook are not credible, 
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Table 3 

Number of clean and fully computerized centers with m notebooks 



m 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 



C m 1,044 820 665 496 380 300 208 110 54 41 19 12 4 4 2 2 1 



Table 4 

Degrees of freedom (df) related with Xnb(i) 



i 1 


2 


3 


4 


5 


6 


7 8 9 


10 


11 


12 


13 


14 15 


16 


17 18 


df 4,162 


4,162 


3,118 


2,298 


1,633 


1,137 


757 457 249 


139 


85 


44 


25 


13 9 


5 


3 1 




FlG. 6. Kernel estimator of the probability density function 
of T YES versus a standard Normal probability density. 

given the results by centers. The p-value, less than 
the MatLab precision, is strong evidence against 7-Lq. 

Following the same approach, we can test regular- 
ity on the distribution of NO votes and abstentions. 
For that, we define the Z statistics 

(Ni- 



no 



(13) 



Z 



7OVT 



QcTi) 



yfq c {l ~ q C )Ti{v c - Ti) I \V C - Ti) 

(Oi - r c Ti) 



and 



y/r c (l - r c )Ti(u c - Ti)j(y c - n) 

q c = n c jv c and r c = (y c -y c - n c )/u c being the pro- 
portion of NO votes and OUT votes in the center c 
to which notebook i belongs. As an illustration, Fi- 
gure 7 shows the normal probability plots of these Z 
statistics based on a random sample of a properly 
conducted referendum. The figure shows also the 
normal probability plots of the scores based on the 
official results. These plots show a widespread depar- 
ture from the expected values, even stronger than 
for the YES case (be careful with the scales of these 
figures). In fact, if we define test statistics to the 
distribution of NO votes and OUT votes, T NO and 



j-OUT reS p ec tively, similar to what we did for the 
YES votes, then we have 

obs ^ ^obs ^ -'obs • 

Clearly, Hq can be completely rejected. 

3.5 Testing the Hypothesis of an Atypical 
Fair Referendum 

As mentioned previously, the widespread depar- 
ture of YESs, NOs and OUTs per notebook from 
their expected values could be the outcome of in- 
nocent irregularities in the conduct of the referen- 
dum. Incorrect allocation of voters to notebooks and 
the passing of votes from one notebook to another 
during the vote counting, by bugs in the program- 
ming, are examples of such irregularities. These ir- 
regularities may generate, in particular, Z OUT out- 
liers but, by the secrecy of the ballot, they should 
not be associated with a trend in the vote counting. 
Next, we propose a testing methodology, based on 
a simple statistical control chart, for testing trend 
in the vote counting on potentially irregular note- 
books. The methodology can be easily extended to 
other electoral audit frameworks. It relies on the as- 
sumption that unexpected irregularities can occur 
in any unit poll with the same probability. 

Denote by R the ratio between NO votes and total 
valid votes in the target population, consisting of 
K = 18,297 notebooks, namely, 



(14) 



R 



In sampling jargon, R is the population ratio and K 
is the size of the population. Let be the sample 
consisting of the k notebooks with the most extreme 
^OUT varue s. This is the set of k notebooks with 
^OUT va i U es furthest away from zero. Given a con- 
fidence level 1 — a, there is a k := k(a) such that 
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Fig. 7. Normal probability plot of Z N (left) and Z (right) scores based on a random sample (top) and observed values 
(bottom). 



matches the set of notebooks with Z OUT values that 
we consider that are outliers, that is, the set of note- 
books with Z OVT values out of the (1 - a) x 100% 
normal confidence interval. In our case study, if the 
confidence level is 99%, then k = 706. Roughly, 4% 
of the Z OUT values are out of the 99% confidence 
interval. In what follows, k varies in a range such 
that Sk corresponds to the set of outliers, according 
to some reasonable confidence level. 
Denote by the sample ratio based on S^. That 

is, 



(15) 



Note that r& is not the usual ratio estimator, since 
we are sampling notebooks with atypical Z OUT val- 
ues. Thus, we might expect that observations (JVj, Tj) 
in Sk are larger or smaller than those from a simple 
random sample (SRS). However, if the irregularities 
are innocent, if they do not introduce bias in the 
vote counting, r& should be similar to the sample 
ratio based on a SRS. In particular, if k is large, the 
bias of the estimator will be small and the variance 



can be approximated by 

Var(r fc ) a 
(Lohr, 2004), with 



Sk 



i_ Jl\± s 1 

K) n\k 



K 



1=1 



and 



si 



k 



(Nt - r k Ti) 



Thus, under the hypothesis H\ defined in Section 3.2, 
if k and K — k are large enough, 



(16) 



rk-R 
Si 



is distributed approximately standard normal 
variable. In what follows, we will only consider 100 < 
k<^K. 

To illustrate the above, consider 1,000 indepen- 
dent copies of C500 from a random sample of atypical 
fair referenda. We simulate an atypical fair referen- 
dum by introducing 700 innocent irregularities on 
a random sample of the electoral process. Each ir- 
regularity consists in passing a random proportion of 
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Fig. 8. Left panel: Normal probability plot of Z scores from an atypical fair referendum. Right panel: Standard normal 
probability density versus kernel estimator of the probability density function of r^oo, based on 1,000 independent copies of an 
atypical fair referendum. 



votes (10% on average) from a notebook to another 
located in the same center. This handling produces 
a significant number of Z OUT outliers (outside the 
99% normal confidence interval) to those already ob- 
tained before the manipulation. The normal proba- 
bility plot of the Z OJJT scores of one of these atypical 
fair referenda is shown in Figure 8 (left panel) as an 
example. As we can see, the shape of the plot is sim- 
ilar to that observed for the referendum (Figure 7, 
right bottom panel). We test normality of Csoo with 
different methods, all of them with the same con- 
clusive positive results. To illustrate, the right panel 
of Figure 8 compares the kernel density estimator 
of the probability density function of £500 with the 
probability density of a standard Normal. 

We can test the hypothesis of an irregular fair 
referendum using the Cfc scores. High values of Ck 
imply that irregularities introduce a bias in favor of 
the NO option in the vote counting. Small values of 
this score imply a bias in favor of the YES option. 
Under Hi, we expect Ck to be within a confidence 
interval. 

The Ck scores corresponding to the official results 
are plotted in Figure 9 (top line), for k between 100 
and 706. To illustrate the behavior that we expect 
in an atypical fair referendum, we plot, in the same 
figure, 100 simulated scores series of the atypical 
fair referendum discussed above. The Cfe scores cor- 
responding to the official results are well above the 
99.99% confidence interval (-3.9,3.9) for 100 <k< 
706. Although a small number of simulated trajec- 
tories also reach high values, all of them are embed- 
ded in (—3.9, 3.9) and most of them are in the 99% 
confidence interval (—2.58,2.58), as one expects.We 
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Fig. 9. £/t versus k for official results (top line) and simu- 
lated atypical fair referenda. 

observed similar behavior in 1,000 additional sim- 
ulated trajectories (not plotted). The scores series 
of the referendum reaches values higher than any 
that we observed in simulations, being the only one 
always well above 3.9, for 100 <k< 706. This pro- 
vides strong evidence against T~L\ than a fairly small 
p- value of a Ck score, for some k. We are seeing a sig- 
nificant bias in the vote counting on notebooks as- 
sociated with irregularities, which is almost impos- 
sible to observe under T~L\. All the above is strong 
evidence for rejecting it. 

3.6 Testing the Hypothesis of Bizarre but Fair 
Referendum 

Most political scientists expect more innocent ad- 
ministrative errors in areas with more poor voters 
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Table 5 Table 6 

Results in Center 1123 (C. M. A. Dr. Angel Vicente Ochoa, Presidential elections of 1998, results in 

in Santa Rosalia, Caracas) C. M. A. Dr. Angel Vicente Ochoa Center 
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2 
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4 
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1 
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191 


60 


233 


62 
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359 
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357 


265 


247 
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588 


583 


594 


567 


T 


899 


645 


624 



(M. Lindeman, personal communication, July 2010). 
In addition, "the conventional wisdom about con- 
temporary Venezuelan politics is that class voting 
has become commonplace, with the poor doggedly 
supporting Hugo Chavez while the rich oppose him" 
(Lupu, 2010). If both beliefs are true, we expect 
more innocent irregularities in strongholds of the 
NO option, which would explain the atypical result 
observed in the above section. That is what H2 de- 
scribes, a general scenario in which there are more 
innocent irregularities in centers that support the 
winning option. To illustrate this possibility, we show 
in Table 5 the results in Center 1123 (C. M. A. Dr. 
Angel Vicente Ochoa, in Santa Rosalia, Caracas), 
one of the most extreme results. All its notebooks 
are associated with very extreme Z OUT values 
(greater than 18.53!). But the overall NO propor- 
tion (65%) is even less than that observed in the 
presidential elections of 1998 (67%). This center ap- 
pears to be a bona fide Chavez stronghold. However, 
we have to remark that, in this election, the Z OUT 
values of that center are quite normal, all of them 
between —0.4 and 0.40. The results, only three unit 
polls in that election, are shown in Table 6. 

A naive procedure to see if the irregularities af- 
fected mostly notebooks in Chavez's strongholds 
would be repeating the previous analysis on all cen- 
ters with outliers. We consider an alternative anal- 
ysis for an important reason: If indeed tampering 
occurred, it is possible that, in order to mask the 
stuff, the irregularities were committed precisely in 
Chavez's bastions. In addition, we have to admit 
that we do not have access to the re-coding of cen- 
ters to automatize the procedure. 

Lupu (2010) provided evidence that the presiden- 
tial election of 1998 was more monotonic in class 
voting than the referendum. This means, the poor 
were more likely to vote for Chavez in 1998 than in 
2004. Thus, we expect more innocent irregularities 
in Chavez's strongholds in 1998 than in 2004. In ad- 
dition, there is not doubt about the legitimacy of 
this election (Neuman and McCoy, 2001). For these 



reasons, the election of 1998 is very appropriate to 
test if irregularities affect mostly notebooks in cen- 
ters that support Chavez, that is, to test H2, de- 
fined in Section 3.2. The testing schema we use is 
to reject %2 if we fail to reject Hi for the elections 
of 1998. We begin verifying that there is a signifi- 
cant presence of Z OUT outliers in 1998: 5% of Z OUT 
values (797 of 15,667) are out of the 99% normal 
confidence interval. The evidence against %q is of 
the same order as in 2004. Furthermore, the most 
extreme Z values of 1998 are higher than those 
observed in 2004. We omit the details and summa- 
rize results by showing the normal probability plot 
of the Z OUT scores in Figure 10 (left panel). It seems 
possible that, in complex elections, ad hoc decisions 
are made to resolve problems that arise on the fly. 
As we have discussed previously, this can produce 
large outliers in the vote distribution. However, the 
test discussed in the previous section strongly sup- 
ports "Hi for the presidential elections of 1998. The 
corresponding scores series {Ck, 100 < k < 797} is al- 
most embedded in the 99% confidence interval; see 
right panel of Figure 10. Therefore, we see that there 
is little reason to think that the significant presence 
of Z olJT outliers, that are the result of innocent ir- 
regularities, affect mostly a set of notebooks from 
Chavez's strongholds. Irregularities seem to occur 
randomly, regardless of whether the notebook be- 
longs to a Chavez bastion or not, and thus, we re- 
ject %2- 

3.7 Estimating the Effect of the Irregularities 

We have provided statistical evidence that there 
was a significant presence of irregularities that fa- 
vored the winning option in the vote counting of 
2004. But, how much could the irregularities affect 
the overall results? Suppose that the Z OUT outliers 
are the tip of the iceberg and there is bias in the 
vote counting of a high proportion of notebooks, not 
just in the notebooks with extreme Z OUT values. To 
evaluate this assumption, we analyze the behavior 
of the sample ratio in (15) for higher values of k 
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Fig. 10. Left panel: Normal probability plot of Z scores of the presidential elections of 1998. Right panel: C,k versus k 
for presidential elections of 1 998. 



0.7 
0.68 
0.66 
0.64 
0.62 

0.6 
0.58 
0.56 
0.54 




1000 1500 2000 2500 



Fig. 11. Proportion of Chavez's votes in Sk versus k for the 
referendum (top line) and presidential elections of 1998 (line 
from the middle to the bottom) and 2000 (line from the bottom 
to the middle). 

than those that we have already considered. Fig- 
ure 11 shows that proportion for k varying from 100 
to 3000 (top line). The shape shows a strong corre- 
lation between the trend in the vote counting and 
the discrepancy between valid votes and its expecta- 
tion, not only in notebooks with Z OUT outliers. We 
remark that for k = 100 we are considering 41,533 
valid votes, a very large sample size for estimating 
proportions (an accepted standard for pollsters is 
above 1,000). What we expect when we increase the 
sample size is exactly what we have for the pres- 
idential elections of 1998 (line from the middle to 
the bottom in Figure 11): The proportion is a func- 
tion of the sample size that slightly varies around 
the population proportion, and that quickly stabi- 
lizes around this value. So, our assumption of irreg- 



ularities that affect the vote counting across all the 
notebooks is quite possibly true. 6 Let us measure 
how much it could affect the totals. 

Let R be the population ratio defined in (14). 
Bounds for the relative error, introduced in the vote 
counting by the irregularities, can be obtained maxi- 
mizing and minimizing the relative error (r^ — R) / R. 
Thus, we can provide a prediction interval for the 
corrected proportion of votes in favor of Chavez, 
namely, 



(17) 



max 

100<k<K/2 



1-^1* 



R 



A r k - R 
mm 1 



100<fc</</2 \ 



R 



R 



K being the total number of notebooks. Note that 
we are considering up to 50% of the notebooks (k <C 
K), those with the highest \Z OUT | values. 

For example, the prediction interval (17) for the 
presidential election of 1998, in which Chavez won 
with 56% of valid votes, is [55%, 57%]. We remark 
that this is an example of an atypical but fair elec- 
tion, where the results were well accepted by politi- 
cal parties and international observers. 

Let us consider next the presidential elections of 
2000. As mentioned above, The Carter Center con- 
siders this election as flawed and not fully successful. 



6 Martm (2011), which unfortunately was not available for 
my review, studies the volume of traffic in incoming and out- 
going data between notebooks and totalizing servers. It pro- 
vides evidence that the vote counting of a high percent of 
notebooks could be affected from the totalizing servers. 
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Fig. 12. Left panel: Normal probability plot of Z scores 
for presidential elections of 2000. 

However, they also emphasize that the irregularities 
did not change the presidential results. Our method- 
ology confirms their conclusion. Figure 12 summa- 
rizes our testing analysis. We observe the highest 
presence of Z ol]T outliers in 2000: 9% of Z OVT val- 
ues (327 of 3730) are outside the 99% normal confi- 
dence interval. Also, the most extreme Z OUT scores 
of 2000 are higher than the observed in 1998 and 
2004. But, there is not evidence to reject T~L\ for this 
election. The £ scores series is always in the 99% 
normal confidence interval, except for a short ex- 
cursion. Moreover, the prediction interval (17) for 
this election is [59%, 62%], and Chavez was elected 
with 59% of the valid votes. 

We do observe a controversial result in the referen- 
dum, managed by a different electoral umpire from 
those that managed the elections of 1998 and 2000: 
The prediction interval is [47%, 57%]. The official 
result (59%) is out of range, while results that over- 
turn the winner are within. We remark that while 
this is not proof that irregularities changed the over- 
all results, it does illustrate that such a scenario is 
plausible. Certainly, the result should be, at least, 
more in line with the prediction interval. 

4. CONCLUSIONS 

The main tool for conciliating political actors in 
an election under suspicion of fraud is a full audit. 
When this is not possible, statistical methods for de- 
tecting numerical anomalies and diagnosing irregu- 
larities can be useful for evaluating the likelihood of 
the allegations of fraud. This is the aim of election 
forensics (Mebane, 2008), an exciting area of ap- 
plied statistics. Election forensics has been applied 
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of the presidential elections of 2000. Right panel: versus k 



for several recent controversial elections, including 
2004 USA, 2006 Mexico, 2008 Russia and 2009 Iran 
(Mebane, 2011); see the personal web page of Wal- 
ter Mebane. 7 The Venezuelan recall referendum is 
a case study that shows a wide pallet of the com- 
monly used statistical tools and problems that can 
arise in this type of analysis, as shown by our re- 
view in Section 2. In particular, we have highlighted 
problems related to exit polls, causal relationship 
between number of votes and dependent variables, 
Benford's Law, different levels of data aggregation, 
goodness of fit, and election modeling. Beyond the 
statistical learning, the hard criticism of some of the 
papers reviewed relates to a deep concern about the 
future of this emerging area. I am convinced that the 
diffusion of inaccurate analyses only causes founded 
allegations of fraud to be undervalued. At least, this 
was the case of the Venezuelan referendum. 

We propose a forensic election methodology, based 
only on vote counting, to analyze the referendum. 
Also the Venezuelan presidential elections of 1998 
and 2000 are reviewed. Unlike previous work, we 
used the full information of the official dataset. This 
consists not only of the number of votes for and 
against revoking the mandate of President Chavez, 
but also the number of abstentions and invalid votes 
at the official data unit with the lowest number of 
votes. The main conclusion of the present paper is 
that there were a significant number of irregularities 
in the vote counting that introduced a bias in favor 
of the winning option. We provide prediction inter- 
vals for the bias, showing that the scenario in which 



7 www-personal, umich.edu / ~ wmebane . 
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the bias could overturn the results is plausible. This 
places solid evidence in the arena, substantiating the 
allegations of fraud made at the time. 
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